MFTHOn AND APPARATUS FO R TMPROVTNG OUTPUT SKEW 

Field of the Invention 

The present invention generally relates to the field of integrated circuit devices, and more 
5 particularly relates to the generation of signals across an output bus of such a device. 

Background 

As the processing speeds of computer systems have continually increased, there has 
been a corresponding need for faster and faster random access memory (RAM) devices. RAM 
10 devices, such as dynamic random access memory (DRAM) devices, are typically used as the 
Ci main memory in computer systems. While DRAM devices have gotten faster over the years, the 
s;| operating speeds of DRAM devices still lag behind the operating speeds of the processors which 

^jj access the DRAM devices. Consequently, the relatively slow access and cycle times of DRAM 

Jjj devices slow down the processors, and create bottlenecks. 

flks In response to the need for faster DRAM devices, synchronous dynamic random access 

pi; memory (SDRAM) devices have been developed. SDRAM devices operate synchronously with 

the system clock which drives the processor that accesses the devices, with the input and output 
U;j data of the SDRAM devices being synchronized to an active edge of the system clock. The 

H initial SDRAM devices can be referred to as single data rate (SDR) SDRAM devices since their 

20 peak data rate is equal to the rate at which commands can be clocked into the devices. Single 

data rate SDRAMs are currently in widespread use. 

To provide still faster DRAM devices, double date rate (DDR) SDRAM devices have 
been developed to provide twice the memory data bandwidth of SDR SDRAMs. The term DDR 
refers to the fact that the peak data rate is twice the rate at which commands can be clocked into 
25 the devices. DDR SDRAM devices typically allow commands to be entered on the positive edge 
of the system clock, and allow data transfers on both the rising and falling edges of the system 
clock to provide twice as much data as a SDR SDRAM device. DDR SDRAM devices typically 
employ a 2n-prefetch architecture, in which the internal data bus is twice the width of the 

Atty. Docket No. 303.736US1 1 Client Ref. No. 00-0915 



external data bus. With this architecture, each read access cycle internal to the device provides 
two external data words, and each write access cycle internal to the device writes two combined 
external data words into the device. 

In a purely synchronous system, output data (and capture of the output data by a 
memory controller) would be referenced to a common free-running system clock. In such a 
system, the maximum data rate would be reached when the sum of the output access time and the 
flight time approaches the bit time. Although the data rates could be increased by generating 
delayed clocks for early data laimch and/or late data capture, these data rates would still be 
limited because these techniques do not account for the fact that the data valid window (i.e., the 
"data eye") moves relative to any fixed clock signal due to changes in temperature, voltage or 
loading. To allow for even higher data rates, data strobe signals were added to DDR SDRAM 
devices. The data strobe signals are non-free-running signals driven by the device driving the 
data (i.e., the DDR SDRAM devices for READs, and the memory controller for WRITES). For 
READs, the data strobe signal is effectively an additional output having a predetemiined pattem. 
For WRITES, the data strobe signal is used by the SDRAM device as a clock in order to capture 
the corresponding input data. 

Referring to FIG. 1, a data output timing diagram 10 for an existing DDR SDRAM 
device illustrates the relationship between the bidirectional data strobe signal and the data 
input/output signals for an exemplary READ operation (e.g., a four-word burst). In this example, 
the DDR SDRAM is assumed to be a 64 Mb x8 DDR SDRAM device available from Micron 
Technology, Inc. The CK and CK# signals represent differential system clock inputs, the DQS 
signal represents the data strobe signal, and the DQ signals represent the data input/output signals 
forming the device data bus. The DQS signal includes preamble, toggling, and postamble 
portions. The preamble portion provides a timing window for the receiving device to enable its 
data capture circuitry with a known/valid level present on the DQS signal. After the preamble 
portion, the DQS signal toggles in the toggling portion at the same frequency as the CK signal 
for the duration of the four-word data burst. Each high transition and each low transition of the 
DQS signal is associated with one data word, provided by the DQ signals driven by the DDR 
SDRAM device. In the postamble portion, the DQS signal goes low to indicate the end of the 
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data biirst to the receiving device. Thus, as shown, the data words are transmitted at twice the 
frequency of the system clock CK. 

As illustrated in FIG. 1, the DQS signal is nominally edge-aligned with all of the DQ 
signals such that all of these output signals will transition at the output pins of the DDR SDRAM 
device at nominally the same time. The memory controller will then internally delay the DQS 
signal to the center of the received data eye upon capturing the data. The edge-alignment of the 
DQS and the DQ signals occurs because these output signals are all clocked out of the DDR 
SDRAM device by the same intemal clock signal. Ideally, the DQS and DQ signals would all be 
perfectly aligned. However, as also shown in FIG. 1, the transitions of the DQS and DQ signals 
include a spread or distribution in time, which is due to both static effects (e.g., intemal routing 
mismatch) and dynamic effects (e.g., data pattern and simuhaneously switching outputs (SSO)). 
Even if critical signals are properly laid out on the die (e.g., using matching trace lengths), 
inherent differences in the package leadfingers' parasitics will contribute to this spread between 
the DQS and DQ signals, which is referred to as "output skew". The output skew is specified by 
a parameter known as t^qsq, which is the pin-to-pin skew measured at the DQS and DQ outputs 
of the device (i.e., the time between the transition of the DQS signal and the last DQ data valid). 
This skew (or Itogsql) region is a region of imcertainty since at least one of the output signals is 
not valid. It is noted that the DQS signal may transition first, last, or somewhere in the middle of 
the DQ transition window. Maximum t^Qsg is currently specified as 500 psec. 

The data word being read will be vahd once the latest DQ signal in the group has 
transitioned, and will remain valid until the earliest DQ signal transitions as part of the next data 
word, or upon completion of the burst. The duration of this data valid window is specified by the 
tov parameter, as shown in FIG. 1. The time between the transition of the DQS signal to the first 
DQ signal going non-valid is referred to as tg^ (also shown). As is suggested by FIG. 1, output 
skew tj^QSQ has an adverse impact on data valid window tpv. In particular, a relatively long output 
skew region will cause the data valid window to be relatively short. Since the memory controller 
can only capture data during the data valid window t^y^ the output skew t^Qsg will also adversely 
impact the data capture operation. 
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Thus, although the addition of data strobe signals allowed for increased data rates, the 
operating speeds of existing DDR SDRAM devices are still limited by the output skew specified 
by the t^Qsq parameter. In particular, the output skew limits the operating speed (e.g., access and 
cycle times) of DDR SDRAM devices. Therefore, it would be desirable to provide a method and 
apparatus for reducing skew across the output data bus of a DDR SDRAM device, thereby 
enlarging the data eye for data capture by the memory controller. It would also be desirable to 
provide a method and an apparatus for reducing skew across multiple output signals in other 
memory device types, and other integrated circuit devices. 

Summary of the Invention 

According to one aspect of the invention, a synchronous integrated circuit device having 
an output bus for outputting a plurality of output signals includes a clock input buffer, a delay 
line coupled to the clock input buffer, and an output circuit coupled to the delay line. The clock 
input buffer receives a system clock signal and generates a buffered clock signal. The delay line 
receives the buffered clock signal and generates a delayed clock signal. The output circuit 
includes a plurality of output signal paths for outputting the plurahty of output signals 
synchronously with the system clock signal by using the delayed clock signal. At least one of 
the output signal paths includes a delay circuit and an output buffer coupled to the delay circuit. 
Each delay circuit provides a programmable delay to the delayed clock signal to generate a 
unique delayed clock signal which is used for clocking an output signal into the respective output 
buffer. 

According to another aspect of the invention, a method of outputting output signals on 
an output bus of a synchronous integrated circuit device with decreased output skew includes 
receiving a system clock signal, delaying the system clock signal to generate a delayed clock 
signal, and applying the delayed clock signal to a plurality of output signal paths. In each of the 
output signal paths, the method includes using the delayed clock signal to output the output 
signals synchronously with the system clock signal. In at least one of the output signal paths, the 
method further includes providing a programmable delay to the delayed clock signal to generate 
a unique delayed clock signal used for clocking an output signal out from the respective output 
signal path. 
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Other aspects of the present invention will be apparent upon reading the following 
detailed description of the invention and viewing the drawings that form a part thereof 

Brief Description of the Drawing s 

FIG. 1 is a data output timing diagram for an existing double data rate (DDR) 
synchronous dynamic random access memory (SDRAM) integrated circuit device; 

FIG. 2 is a circuit block diagram of a DDR SDRAM device having decreased output 
skew in accordance with one embodiment of the present invention; 

FIG. 3 is a circuit block diagram of a DDR SDRAM device having decreased output 
skew in accordance with another embodiment of the present invention; 

FIG. 4 is a block diagram showing one embodiment of a variable delay circuit for use in 
each of the output paths of the DDR SDRAM device shown in FIG. 2 or FIG. 3; 

FIG. 5 is an exemplary data output timing diagram for the DDR SDRAM device of FIG. 
2, wherein the variable delay element for one DQ signal is dynamically modified in order to 
decrease the output skew between that DQ signal and the DQS signal; and 

FIG. 6 is a flowchart showing a method of decreasing output skew in an integrated 
circuit device which generates multiple output signals (e.g., a DDR SDRAM device). 

Detailed Description 

In the following detailed description, reference is made to the accompanying drawings, 
which form a part hereof, and in which is shown by way of illustration specific embodiments in 
which the present invention may be practiced. These embodiments are described in sufficient 
detail to enable those skilled in the art to practice the present invention, and it is to be understood 
that the embodiments may be combined, or that other embodiments may be utilized and that 
structural, logical and electrical changes may be made without departing fi-om the spirit and the 
scope of the present invention. The following detailed description is, therefore, not to be taken in 
a limiting sense, and the scope of the present invention is defined by the appended claims and 
their equivalents. 
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Referring to FIG. 2, an exemplary synchronous integrated circuit device 100 in 
accordance with one embodiment of the present invention comprises a clock input circuit 102, a 
clock delay circuit 104, and an output circuit 106. In this example, device 100 comprises a 
double data rate (DDR) synchronous dynamic random access memory (SDRAM) device having 
5 improved output skew in comparison with previous DDR SDRAM devices. This DDR SDRAM 
device may be similar to one of the DDR SDRAM devices available from Micron Technology, 
Inc., except for the features described herein. For example, the DDR SDRAM device may be 
similar to an MT46V8M8, 64 Mb, x8 DDR SDRAM device available from Micron Technology, 
Inc., except for the features described herein. This DDR SDRAM device is configured as a 2 M 
10 X 8 X 4 banks DDR SDRAM. Additional background information for the MT46V8M8 DDR 
SDRAM device is provided by its data sheet, entitled "Double Data Rate (DDR) SDRAM, 
64Mb: x4, x8, xl6 DDR SDRAM", Micron Technology, Inc., 2000, and from the article entitled 
"DDR SDRAM Functionahty and Controller Read Data Capture", Micron DesignLine, Vol. 8, 
."ij Issue 3, 3Q99. Both of these documents are incorporated herein by reference in their entirety. 

"fib In other embodiments, the apparatus and methods for improving output skew that are 

"-[I disclosed herein may be used in other types of DDR SDRAM devices having other 

configurations. Altematively, the disclosed apparatus and methods may be used in other 
h;! synchronous memory devices, or other synchronous integrated circuit devices, for use in 

Cli improving output skew for a plurality of output signals which are output on an output bus. The 

PijlO apparatus and methods are described herein in reference to a particular DDR SDRAM device for 

convenience only, and the invention should not be limited to such a device. 

In one embodiment, clock input circuit 102 includes a clock input buffer 108. Clock 
input buffer 108 has an input node for receiving an extemal or system clock signal (XCLK) 110, 
and an output node for generating an internal or buffered clock signal (CLKIN) 112. Clock input 
25 buffer 108 provides an inherent delay having a value of A. The detailed implementation of clock 
input buffer 108 will depend on the particular application. 

In one embodiment, clock delay circuit 104 includes a delay locked loop (DLL) 114 
coupled between clock input buffer 108 and output circuit 106. DLL 1 14 is configured to 
receive buffered clock signal (CLKIN) 1 12 from clock input buffer 108, and to generate a 
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delayed clock signal (XDLL) 1 16. In the embodiment of FIG. 2, DLL 1 14 includes a delay line 
1 18, a phase detector 120, and an A + B delay model 122. Delay line 118 has an input node for 
receiving buffered clock signal (CLKIN) 1 12, and an output node for generating delayed clock 
signal (XDLL) 1 16. Delay line 1 18 is configured to delay buffered clock signal (CLKIN) 1 12 by 
5 an adjustable amount under the control of a DLL control signal 124 to generate delayed clock 

signal (XDLL) 1 16. Phase detector 120 has a first input node for receiving buffered clock signal 
(CLKIN) 1 12, a second input node for receiving an internal DLL clock signal (CLKDLL) 126, 
and an output node for generating DLL control signal 124. Phase detector 120 is configured to 
detect the phase difference between buffered clock signal (CLKIN) 1 12 and internal DLL clock 
10 signal (CLKDLL) 126, and to generate DLL control signal 124 based upon the phase difference. 
DLL control signal 124 is then applied to delay line 1 18 to control the amount of delay. Delay 
model 122 has an input node coupled (indirectly) to delayed clock signal (XDLL) 116, and an 
."^i j output node coupled to the CLKDLL input node of phase detector 120. Delay model 122 models 
H the sum of the delays of input circuit 102 (i.e.. A) and output circuit 106 (i.e., B). 

^5 Clock delay circuit 104 is thus configured to provide a delay, having a value of C, 

"=^,1 which is substantially equal to the period of system clock signal (XCLK) 110 less the sum of the 

; delays of input circuit 102 and output circuit 106. In other words, clock delay circuit 104 

C| provides a delay having a value of C = t^cLK - (A + B). By providing delay C, clock delay circuit 

c;| 104 will cause output signal transitions to appear at the outputs of device 100 in nominal 

^r|20 alignment with the XCLK transitions at the input of device 100. For example, if txcLK ^ ^^^^ 
Ni (i.e., an XCLK frequency of 133 MHz), A = 1.5 nsec and B = 3.5 nsec, then C = 7.5 nsec - (1.5 + 

3.5 nsec) = 2.5 nsec. By providing such a delay, clock delay circuit 104 will cause the output 
signals of device 100 to transition one (1) clock cycle (i.e., 7.5 nsec) after a transition of XCLK, 
such that the output signals will be aligned with the next transition of XCLK. While delays A 
25 and B will vary with voltage and temperature, DLL 1 14 will vary the value of the delay provided 
by clock delay circuit 104 (i.e., delay C) in order to keep the output signals synchronous with 
system clock signal (XCLK) 110. 

In one embodiment, delayed clock signal (XDLL) 1 16 would be coupled directly to the 
input node of A + B delay model 122, and the delay provided by delay line 118 would be equal 
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to the delay provided by DLL 1 14. In another embodiment, as shown in FIG. 2, DLL 1 14 also 
includes a clock multiplexer 128 and a DQ multiplexer driver 130, which are collectively 
referred to herein as a clock driver circuit 132. Clock driver circuit 132 has an input node 
coupled to delay Une 118, and a pair of output nodes 134 coupled to output circuit 106 and to A 
5 + B delay model 122. Clock driver circuit 132 is configured to receive delayed clock signal 
(XDLL) 1 16, to multiplex XDLL 116 into differential delayed clock signals (CLKDQ, 
CLKDQL) 136, and to drive the differential delayed clock signals to generate a rising-edge 
delayed clock signal (DLLRO) and a falling-edge delayed clock signal (DLLFO) on output nodes 
134. Clock driver circuit 132 can thus be used to meet fanout requirements for connecting the 
10 DLLRO/DLLFO signals to output circuit 106. The total amount of delay provided by clock delay 
circuit 104 in this embodiment (i.e., C) is the delay provided by delay line 118 plus the inherent 
delay of clock driver circuit 132. 

ijII The generation of both rising-edge and felling-edge delayed clock signals (DLLRO and 

.ilj DLLFO) may be advantageous for a DDR SDRAM device, where data is clocked into and out of 

^15 the device on both the rising and falling edges of the system clock signal 1 10. In particular, these 

i DLLRO/DLLFO signals can advantageously be used to output first and second data words 

'''I synchronously with rising and falling edges of system clock signal 110. 

H In other embodiments, clock delay circuit 104 includes different types of delay locked 

Cii loops. For example, circuit 104 may comprise a digital DLL, an analog DLL, a continually 

d|20 locked loop, a periodically calibrated delay line, etc. Further, clock delay circuit 104 may or may 

not include a clock driver circuit, and may or may not generate both rising-edge and falling-edge 

delayed clock signals, depending on the particular application. 

Output circuit 106 has one or more input nodes coupled to clock delay circuit 104 for 
receiving one or more delayed clock signals. The received delayed clock signal(s) may include 
25 delayed clock signal (XDLL) 1 16, or both rising-edge and falling-edge delayed clock signals 
(DLLRO/DLLFO) 134. For simplicity, the remainder of this description assumes that output 
circuit 106 receives the DLLRO/DLLFO signals, as shown in FIG. 2. As described further below, 
output circuit 106 includes a plurality (i.e., n) of output signal paths configured to output the 
plurality of output signals synchronously with system clock signal (XCLK) 1 10 by using delayed 
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clock signals DLLRO/DLLFO. In the case of device 100 being a DDR SDRAM device, the n 
output signal paths include one output signal path for outputting a bidirectional data strobe signal 
DQS, and (n-1) output signal paths for outputting (n-1) data input/output signals DQs, in 
response to a read command. For a x8 DDR SDRAM, n would equal nine (9), and the nine 
output data paths would include one output signal path for the DQS signal, and eight output data 
paths for the eight DQ signals. 

In one embodiment, each of the n output signal paths of output circuit 106 includes a 
variable or programmable delay circuit 138 and an output buffer 140. Each delay circuit 138 has 
two input nodes for receiving rising-edge and falling-edge delayed clock signals 
(DLLRO/DLLFO) 134, and has two output nodes for generating unique rising-edge and falling- 
edge delayed clock signals (DLLROn and DLLFOn) 142. Each delay circuit 138 is configured to 
provide a programmable delay to delayed clock signals (DLLRO/DLLFO) 134 to generate unique 
delayed clock signals (DLLROn/DLLFOn) 142 for the nth output signal path. In this 
embodiment, the amount of delay provided by each of the delay circuits 138 is independent of 
the amount of delay provided by any of the other delay circuits 138. Each of the imique delayed 
clock signals (DLLROn/DLLFOn) 142 is applied to the output buffer 140 of that particular output 
signal path, and is used to clock an output signal (i.e., one of the DQ signals or the DQS signal) 
into the respective output buffer. Each output buffer 140 then provides the output signal to an 
output pad 144, which is typically connected to a pin (i.e., one of the DQ pins or the DQS pin) on 
the integrated circuit package for device 100. 

The delays provided by delay circuits 138 are programmed to decrease output skew 
across the n output signals. Note that, if all of the delay circuits 138 (i.e., the delay circuits for 
all of the n output signal paths) were programmed to the same delay value, then the transitions of 
the DQ and DQS output signals could still include a relatively large spread in time due to factors 
such as internal routing mismatch, data pattern and simultaneously switching outputs, and 
inherent differences in the package leadfingers' parasitics. By independently programming each 
delay circuit 138, however, the factors contributing to output skew can be compensated for, and 
the output skew of device 100 can be reduced. 
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In one embodiment, output circuit 106 also includes delay control logic 146 for 
dynamically progranmiing delay circuits 138 during operation of device 100. Delay control logic 
146 is in a feedback path from the DQ and DQS output signals to delay circuits 138. Delay 
control logic 146 has input nodes to receive the n output signals 148 from output buffers 140 (or 
from other nodes within the output signal paths, such as output pads 144), and has output nodes 
for generating delay control signals 150 for delay circuits 138. Delay control logic 146 is 
configured to determine output skew between the DQ and DQS signals, and to generate delay 
control signals 150 so as to reduce, minimize or eUminate the skew between the DQ and DQS 
output signals 148 (or the DQ and DQS signals at pads 144). 

In one embodiment, delay control circuit 146 is configured to determine the slowest 
(i.e., worst case) DQ or DQS output signal. The delay circuit 138 corresponding to this slowest 
output signal is set to a zero or minimal delay value. Then, delay control circuit 146 detects the 
output skew between each of the other DQ or DQS signals and the slowest output signal, and 
individually programs the delay circuit 138 corresponding to this other DQ or DQS signal based 
upon the output skew detected for that DQ or DQS signal. For example, if delay control circuit 
146 determines that the DQ3 signal is the slowest output signal, and that the output skew 
between the DQS and the DQ3 signals is 100 psec (i.e., DQ5 is 100 psec ahead of DQ3), then 
delay control circuit 146 generates the delay control signal 150 for the delay circuit 138 for the 
DQS signal so as to cause a delay of about 100 psec. This 100 psec delay of the DQS signal will 
cause the DQ3 and DQS signals to become aUgned. Delay control signals 150 are similarly 
generated for all of the other DQ and DQS output signals. Thus, by independently controlling 
the DLLROn/DLLFOn signals 142, the output skew across all of the DQ and DQS output signals 
can be decreased, thereby enlarging the data eye for data capture by the memory controller. The 
decreased output skew also allows for increased operating speed (e.g., faster access and cycle 
times) for integrated circuit device 100. 

In another embodiment, delay control circuit 146 defines a reference output signal path, 
such as that for the DQS output signal (although any of the output signal paths may be defined as 
the reference path). The delay circuit 138 for this reference output signal path is set to a 
midpoint delay value. The midpoint delay value may be set in the middle of the delay values that 
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the delay circuit 138 is capable of providing (i.e., the 50% delay value), or may be set at some 
other point between the minimum and maximum delay values (e.g., a 25% or 75% delay). Then, 
delay control circuit 146 detects the output skew between each of the non-reference DQ or DQS 
signals and the reference output signal, and individually programs the delay circuit 138 for this 
5 non-reference DQ or DQS signal based upon the output skew detected for that DQ or DQS 

signal. The delay circuit 138 for any non-reference DQ or DQS signal slower than the reference 
signal is set to a delay value less than the midpoint delay (i.e., to speed up that non-reference 
signal), and the delay circuit 138 for any non-reference DQ or DQS signal faster than the 
reference signal is set to a delay value more than the midpoint delay (i.e., to slow down that non- 
10 reference signal). If, for example, delay control circuit 146 defines the DQS signal as the 

reference, and determines that the DQ3 signal is 100 psec slower than the DQS signal, the delay 
circuit 138 for the DQ3 signal is set to a delay value 100 psec less than the midpoint delay. If, on 
the other hand, delay control circuit 146 finds that the DQ3 signal is 50 psec faster than the DQS 
"-^'i signal, then the delay circuit 138 for the DQ3 signal is set to a value 50 psec more than the 

jis midpoint delay. In either case, the DQ3 signal will become ahgned with the DQS signal. Delay 

control signals 150 are similarly generated for all of the other non-reference output signals. 
n;j Thus, any DQ or DQS signal can be sped up or slowed down to match any other DQ or DQS 

U signal. Therefore, by independently controlling the DLLROn/DLLFOn signals, the output skew 

across all of the DQ and DQS output signals can be decreased. 

^ho It should be understood that the embodiments of delay control circuit 146 described 

U] herein are exemplary, and that other embodiments of delay control circuit 146 may be used. 

In order for delay control circuit 146 to detect output skew across the DQ and DQS 
output signals, the DQ and DQS signals should be simultaneously transitioning (e.g., both 
transitioning high, or both transitioning low). While the DQS signal of DDR SDRAM devices is 
25 defined so as to toggle during its toggling portion at the same fi-equency as the system clock 
signal for the duration of a read data burst, the DQ signals may or may not toggle, depending 
upon the particular data values that are being read. If, for example, the DQn signal were to 
remain at a logic 0 throughout a read data burst, or were to remain at a logic 1 throughout the 
read data burst, then delay control circuit 146 would be unable to compare transitions of the DQS 
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signal to transitions of the DQn signal, and would therefore be unable to detect the output skew 
between the DQn and DQS signals. In this case, delay control circuit 146 would be unable to 
program delay circuits 138 during this data burst. 

In one embodiment, device 100 uses an initialization mode of operation to insure that 
5 delay control circuit 146 has an opportimity to dynamically program delay circuits 138. As 
indicated by manufacturer data sheets, some DDR SDRAM devices include an initialization 
mode during which the DQ and DQS output signals are not valid, and should be ignored. For 
example, with the MT46V8M8 DDR SDRAM device, users are required to wait for at least 200 
system clock cycles after issuing a reset command (i.e., a DLL_RST command) before issuing 
10 another command to the device. During this 200 clock cycle initialization period, this 

embodiment of device 100 is configured to toggle the DQ and DQS signals. Although these 
signals should be ignored by users, output logic 106 samples the output skew during this period, 
-Jii and dynamically generates delay control signals 150 to properly program delay circuits 138 to 

y!j minimize output skew. The programmed delay circuits 138 can then be used to minimize output 

i;jl5 skew of the DQ and DQS signals after the initialization period ends. The programming of delay 
^ij circuits 138 may be maintained until a subsequent re-initiaUzation of the DDR SDRAM device, 

1'*^ or until another point in time when delay control circuit 146 is able to determine the output skew 

^i! for use as feedback. 

- 1 

^|| In another embodiment, delay control circuit 146 performs dynamic sampling to 

C||20 determine output skew between DQ and DQS signals on any given simultaneous transitions of 
these signals. For example, delay control circuit 146 may determine output skew between any 
DQ signal and the DQS signal whenever that DQ signal and the DQS signal both have a rising 
edge, both have a falling edge, or both have either a rising edge or a falling edge. If, for 
example, delay control circuit 146 only determines output skew on simultaneous rising edges (or 
25 simultaneous falling edges), the same output skew could be used to program the delay circuit 138 
for both the DLLROn and DLLFOn signals. To insure that delay control circuit 146 has an 
opportunity to program delay circuits 138 in this embodiment, user software may be required to 
read appropriate data pattems from device 100 at appropriate times (e.g., periodically during 
operation). This embodiment may be combined with the previously-described embodiment, such 
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that delay circuit programming will occur during initialization, and will then be periodically 
updated during operation. 

Referring to FIG. 3, another exemplary synchronous integrated circuit device 200 in 
accordance with another embodiment of the present invention comprises a clock input circuit 

5 202, a clock delay circuit 204, and an output circuit 206. While device 200 is again a DDR 
SDRAM device, the apparatus and methods for improving output skew in device 200 may be 
used in other types of memory devices, or integrated circuit devices. Clock input circuit 202 and 
clock delay circuit 204 have the same structure and operation as clock input circuit 102 and clock 
delay circuit 104, described above. However, while output circuit 106 of device 100 

10 dynamically programs the delays provided by delay circuits 138 in the output signal paths during 
operation of device 100, output circuit 206 of device 200 is configured to set the amount of delay 
in each output signal path in a static fashion. 

0 In one embodiment, oxitput circuit 206 includes a plurality of output signal paths for 

''4 

Uij outputting a plurality of output signals synchronously with system clock signal (XCLK) 1 10 by 

t jlS using delayed clock signal (XDLL) 1 16 or, as shown in FIG. 3, by using rising-edge and falling- 
H edge delayed clock signals (DLLRO/DLLFO) 134. Again, for convenience, it is assumed output 

2 circuit 206 uses the DLLRO/DLLFO signals, as shown. In the case of a DDR SDRAM, output 

y! circuit 206 includes n output signal paths, with the DQS signal being output by one output signal 

Cj] path and (n-1) DQ signals output on (n-1) output signal paths. 

CI20 Each output signal path includes a variable delay circuit 208 and an output buffer 210. 

Each delay circuit 208 is coupled to clock delay circuit 204 to receive delayed clock signals 
(DLLRO/DLLFO) 134 therefrom. Each delay circuit 208 provides a variable delay to delayed 
clock signals DLLRO/DLLFO to generate unique delayed clock signals (DLLROn/DLLFOn) 212, 
for use in clocking output signals into nth output buffer 210. For example, in a x8 DDR 
25 SDRAM device, there are nine (9) output signal paths, with eight (8) used to output the eight (8) 
DQ signals and one used to output the DQS signal. The signals output by output buffers 210 are 
then applied to a respective output pad 214. 

The programming of variable delay circuits 208 is performed statically such that, once 
the programming has been performed, delay circuits 208 provide static delays. In one 
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embodiment, the programming of delay circuits 208 takes place during the manufacturing 
process, during which output skew of device 200 is measured, and used to permanently configure 
delay circuits 208 to add or subtract delay so as to decrease output skew across the output 
signals. To reduce or eliminate output skew, delay circuits 208 may be configured to slow down 
5 each output signal path to match the speed of the slowest (i.e., worst case) output signal path, or 
to slow down or speed up each output signal path to match the speed of a reference output signal 
path (e.g., the DQS output signal path), in a manner similar to that described above for the 
operation of delay control circuit 146. In another embodiment, the intrinsic delay of each output 
signal path is estimated, modeled or measured during the design process for the integrated circuit 
10 device, and delay circuits 208 are each designed to provide an appropriate amount of delay to 

reduce or eliminate output skew. For example, once the signal routing paths on device 200 have 
been designed and are knovra, the intrinsic delay provided by each signal routing path can be 
1=1 determined, and then used to configure delay circuits 208 to provide appropriate amounts of 
j| delay. Note that, while static programming of delay circuits 208 can be employed to effectively 
ills reduce or eliminate output skew due to static factors, such as internal routing mismatch, such 
C'l programming is less likely to be effective to reduce or eliminate output skew due to dynamic 
n=i factors, such as skew due to data patterns and simultaneously switching outputs. 

Q Note that, once the DQ and DQS signals have been de-skewed, the trimmable option 

r;| fuses that are present in some current DDR SDRAM devices can be configured (e.g., blown) to 

JjpO shift the data window to optimize the access time (i.e., t^c) of the DDR SDRAM device. The 
^\ access time (tAc) of DDR SDRAM devices is defined as the access window of the DQS fi-om the 

clock signal (i.e., the difference in time between a clock edge and the related signal transition that 
access farthest away fi-om that clock edge in time). By de-skewing the DQ and DQS signals of 
device 100, and by trimming device 100, the access time of device 100 can be lowered below the 
25 access times of the current devices. 

In the embodiments of FIGs. 2 and 3, the output signal paths for all of the DQ and DQS 
signals include a programmable delay circuit. Alternatively, in other embodiments, fewer than 
all of the output signal paths include such a programmable delay circuit. For example, in one 
embodiment, a first output signal path (e.g., for the DQS signal) provides a non-programmable 
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amount of delay, and the other output signal paths (e.g., for the DQ signals) include 
programmable delay circuits which are programmed to reduce output skew with respect to the 
first path. The non-programmable amount of delay of the first path may be an intrinsic delay due 
only to internal routing of that first path, or may be due to both internal routing of that first path 
5 and a fixed delay circuit coupled within that first path. 

Referring to FIG. 4, a simplified block diagram shows one embodiment of a variable 
delay circuit 300 for use in the output signal paths of device 100 or device 200. Delay circuit 300 
may be used to provide a unique delay to delayed clock signal (XDLL) 1 16, or to rising-edge and 
falling-edge delayed clock signals (DLLRO/DLLFO) 134, depending upon the particular 
10 application. Delay circuit 300 includes an input node 302, a plurality of delay stages 304 

arranged in serial, and an output node 306. Each delay stage 304 includes a delay element, such 
as a pair of inverters 308, and a switching arrangement to selectively switch the delay element 
y|i into and out of the operative circuit. If, for example, each of switches SWl, SW2, SW3, SW4, 
y^j . . . , and SWm is in a first position as shown in FIG. 4, delay circuit 300 will provide a maximimi 

+^5 amount of delay equal to the sum of the delays provided by each delay element. If, on the other 
hand, the SWm switch is then moved into its second position, then input node 302 will be 
? connected directly to output node 306, and delay circuit 300 will provide a minimum amount of 

Ci| delay. By selectively controlling switches SWl through SWm to switch different delay stages 

C;! 304 in and out, delay circuit 300 can provide different amoimts of delay under the control of the 

nil 

pipo switches. 

^ ' In the dynamic programming embodiment of device 100, delay control signals 150 

provided by delay control circuit 146 are used to control the states of switches SWl through 
SWm. In the static programming embodiment of device 200, the states of these switches may be 
permanently set using metal, fiises, antifuses, or other circuit elements. Note that the number of 
25 delay stages 304 in delay circuit 300, and the amount of delay provided by each delay stage, will 
depend upon the particular application. Generally, by increasing the number of delay stages, and 
decreasing the delay associated with each delay stage, finer resolution can be achieved and the 
amount of output skew can be decreased. Also note that each delay stage can be configured to 
provide a different amount of delay. 
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It should be understood that variable delay circuit 300 shown in FIG. 4 is merely 
illustrative of the many types of variable delay circuits that are known in the art, and many other 
types of variable delay circuits could also be used with the present invention. 

Referring to FIG. 5, an exemplary output data timing diagram for device 100 illustrates 
5 the dynamic modification of the variable delay circuit 138 for one of the DQ signals in order to 
decrease the output skew between that DQ signal and the DQS signal. In this example, it is 
assumed that the DQS signal has been selected as a reference signal having a midpoint delay, and 
that the timing of the DQ signal will be modified based upon output skew between the DQ signal 
and DQS signal in order to decrease the output skew. 

10 In response to the first rising edge of the DLLRO signal, the delay circuits 138 for both 

the DQ and DQS signals are assxmied to provide a variable delay of tyoi, thereby simultaneously 
g;i generating the unique DLLROn signals for the DQ and DQS signals (i.e., DLLROd and DLLROg, 

respectively). While the DLLRO^ and DLLROg signals occur simultaneously, output skew 
U)l introduced in the output signal paths for these two signals causes the rising edge of the DQ signal 
Jjs to lead the rising edge of the DQS signal by an amount t^Qsg-R, which is the output skew between 

these signals. Similarly, in response to the first rising edge of the DLLFO signal, the delay 
= circuits 138 for both the DQ and DQS signals are assumed to provide a variable delay of ty^j^ 

y ! thereby simultaneously generating the unique DLLFOn signals for the DQ and DQS signals (i.e., 

^ DLLFOd and DLLFOg, respectively). While the DLLFOn and DLLFOg signals occur 

C;i20 simultaneously, output skew introduced in the output signal paths for these two signals causes 

the falling edge of the DQ signal to lead the falling edge of the DQS signal by an amount t^qsq.^ 

(i.e., output skew). 

The output skew between the DQ and DQS signals on both the rising and falling edges 
is detected by delay control circuit 146, which then adjusts the variable delay provided by the 
25 delay circuit 138 of the DQ signal to slow down that DQ signal by an appropriate amount to 

reduce the output skew relative to the reference DQS signal. As shown in FIG. 5, in response to 
the second rising edge of the DLLRO signal, the delay circuit 138 for the DQS signal still 
provides the delay tyoi (which was not adjusted since the DQS signal is acting as a reference), 
but the delay circuit 138 for the DQ signal now provides a delay of tvD3, thereby generating the 
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unique DLLROn signal for the DQ signal (i.e., DLLROd) only after the unique DLLROn signal 
for the DQS signal (DLLROg). The additional delay provided by the delay circuit 138 for the DQ 
signal now compensates for the timing difference in the output signal paths for these two signals, 
and causes the rising edge of the DQ signal to be aligned with the rising edge of the DQS signal, 
thereby reducing or eliminating the output skew between these signals. Similarly, in response to 
the second rising edge of the DLLFO signal, the delay circuit 138 for the DQS signal still 
provides the delay tvD2. but the delay circuit 138 for the DQ signal now provides a delay ofty^^, 
thereby generating the unique DLLFOn signal for the DQ signal (i.e., DLLFOd) only after the 
unique DLLFOn signal for the DQS signal (DLLFOg). The additional delay provided by the 
delay circuit 138 for the DQ signal now compensates for the timing difference in the output 
signal paths for these two signals, and causes the falling edge of the DQ signal to be aligned with 
the falling edge of the DQS signal, thereby reducing or eliminating the output skew between 
these signals. Thus, the DQ and DQS signals have now been aligned, and the output skew 
between these signals has been reduced or eliminated. Note that the timing diagram shown in 
FIG. 5 is merely illustrative, and the actual timing diagram would depend upon the particular 
implementation of the circuits. 

Referring to FIG. 6, a method 400 of decreasing output skew in a synchronous 
integrated circuit device such as device 100 in accordance with one embodiment of the present 
invention is shown. Method 400 includes receiving a system clock signal (at 402), delaying the 
system clock signal to generate a delayed clock signal (at 404), and applying the delayed clock 
signal to a plurality of output signal paths (at 406). In each of the output signal paths, method 
400 also includes providing a programmable delay to the delayed clock signal to generate a 
unique delayed clock signal (at 408), and using the unique delayed clock signal for that output 
signal path to clock out an output signal (at 410). Each programmable delay is provided to 
decrease output skew across the output signals. 

CQDclusion 

Thus, an apparatus and method for reducing skew across the output data bus of a DDR 
SDRAM device have been described herein. By reducing output skew, the data eye for the memory 
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controller has been enlarged, and limits on operating speed of the device due to output skew can be 
reduced to allow for faster operation. An apparatus and method for reducing skew across multiple 
data output signals in other memory device types, and across multiple output signals in other 
integrated circuit devices, have also been described. 

5 The above description is intended to be illustrative, and not restrictive. Many other 

embodiments will be apparent to those of ordinary skill in the art. For example, an apparatus or 
method in accordance with the present invention may be used in other types of memory devices, or 
other integrated circuit devices. Also, different types of input circuits, delay circuits, and output 
circuits may be used. Further, the apparatus and method of the present invention may be configured 
10 to sample output skew only on the rising edges of the output signals, or only on the falling edges, 
or on both the rising and falling edges. Also, the delays provided by the programmable delay 
circuits may be programmed statically and/or dynamically, and the programmable delay circuits may 
^ be provided in all or only a portion of the output signal paths. Different types of variable delay 
ij\ elements may be used, and may provide different lengths of delays and different resolutions of delay, 
t'jb The scope of the present invention should therefore be determined with reference to the appended 
claims, along with the full scope of equivalents to which such claims are entitled. 

Ml 
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